Allow name query for MPI_DATATYPE_NULL. #10029

bosilca · 2022-02-19T19:16:36Z

Signed-off-by: George Bosilca [email protected]

jsquyres

In principle, I don't have a problem with this PR. But is it mandated by the MPI spec, or is this an extension?

In the description for MPI_TYPE_GET_NAME in MPI-4.0 (p385 -- or anywhere in section 7.8, "Naming Object" starting on p381), it doesn't explicitly mention that you can pass invalid handles (like MPI_DATATYPE_NULL) to the MPI_*_GET_NAME functions. Indeed, MPI-4.0 p383:33-36 somewhat kinda sorta implies that you can't:

Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and the communicator returned by MPI_COMM_GET_PARENT (if not MPI_COMM_NULL) will have the default of "MPI_COMM_WORLD", "MPI_COMM_SELF", and "MPI_COMM_PARENT".

That being said, if this issue is ambiguous in the MPI standard text, I'm in favor of this functionality because it's one less check that the application program needs to make before calling MPI_*_GET_NAME.

What do we do for the other MPI_*_GET_NAME functions?

Do we know what MPICH does here?

bosilca · 2022-02-20T16:59:04Z

The MPI standard define the name for all predefined datatypes, and MPI_DATATYPE_NULL is part of that group. Thus, I see no reason for the current in MPI_TYPE_GET_NAME that prevents the retrieval of its name.

jsquyres · 2022-02-20T17:10:07Z

I guess I'm a little confused. Can you cite where in MPI-4.0 it says that invalid handles such as MPI_DATATYPE_NULL should have a name returned from MPI_TYPE_GET_NAME? The only description I see under MPI_TYPE_GET_NAME (MPI-4.0 p385:23-24) says:

Named predefined datatypes have the default names of the datatype name. For example, MPI_WCHAR has the default name of "MPI_WCHAR".

It doesn't say that invalid handles are accepted by MPI_TYPE_GET_NAME. What am I missing?

bosilca · 2022-02-20T17:36:55Z

MPI_DATATYPE_NULL is not an invalid datatype, it is a NULL handle, and as such it is a predefined constant. Thus, I believe that the naming rule cited above applies.

Forcing a library writer to check for MPI_DATATYPE_NULL before calling MPI_TYPE_GET_NAME is as sensical having printf not allowing the printing of NULL pointers.

bosilca · 2022-02-22T00:23:56Z

The other get_name functions do not suffer from the same problem. This PR is good to be merged.

jsquyres · 2022-02-22T13:28:19Z

The other get_name functions do not suffer from the same problem. This PR is good to be merged.

Yes they do. MPI_COMM_GET_NAME calls:

ompi/ompi/mpi/c/comm_get_name.c

Line 55 in 580984b

if ( ompi_comm_invalid ( comm ) )

which is

ompi/ompi/communicator/communicator.h

Lines 463 to 493 in 580984b

    
           /** 
        
            * Is this a valid communicator?  This is a complicated question. 
        
            * :-) 
        
            * 
        
            * According to MPI-1:5.2.4 (p137): 
        
            * 
        
            * "The predefined constant MPI_COMM_NULL is the value used for 
        
            * invalid communicator handles." 
        
            * 
        
            * Hence, MPI_COMM_NULL is not valid.  However, MPI-2:4.12.4 (p50) 
        
            * clearly states that the MPI_*_C2F and MPI_*_F2C functions 
        
            * should treat MPI_COMM_NULL as a valid communicator -- it 
        
            * distinctly differentiates between "invalid" handles and 
        
            * "MPI_*_NULL" handles.  Some feel that the MPI-1 definition 
        
            * still holds for all other MPI functions; others feel that the 
        
            * MPi-2 definitions trump the MPI-1 definition.  Regardless of 
        
            * who is right, there is ambiguity here.  So we have left 
        
            * ompi_comm_invalid() as originally coded -- per the MPI-1 
        
            * definition, where MPI_COMM_NULL is an invalid communicator. 
        
            * The MPI_Comm_c2f() function, therefore, calls 
        
            * ompi_comm_invalid() but also explictily checks to see if the 
        
            * handle is MPI_COMM_NULL. 
        
            */ 
        
           static inline int ompi_comm_invalid (const ompi_communicator_t* comm) 
        
           { 
        
               if ((NULL == comm) || (MPI_COMM_NULL == comm) || 
        
                   (OMPI_COMM_IS_FREED(comm)) || (OMPI_COMM_IS_INVALID(comm)) ) 
        
                   return true; 
        
               else 
        
                   return false; 
        
           }

which explicitly checks for MPI_COMM_NULL. There's a lengthy comment there explaining that there is ambiguity about MPI_COMM_NULL -- it is "invalid" or is it just a NULL (but valid) handle? It looks like that issue has not been resolved in the MPI standard.

Additionally, MPI_WIN_GET_NAME invokes ompi_win_invald():

ompi/ompi/mpi/c/win_get_name.c

Line 47 in 580984b

if (ompi_win_invalid(win)) {

which is

ompi/ompi/win/win.h

Lines 152 to 164 in 580984b

    
           /* Note that the defintion of an "invalid" window is closely related 
        
              to the defintion of an "invalid" communicator.  See a big comment 
        
              in ompi/communicator/communicator.h about this. */ 
        
           static inline int ompi_win_invalid(ompi_win_t *win) { 
        
               if (NULL == win || 
        
                   MPI_WIN_NULL == win || 
        
                   (OMPI_WIN_INVALID & win->w_flags) || 
        
                   (OMPI_WIN_FREED & win->w_flags)) { 
        
                   return true; 
        
               } else { 
        
                   return false; 
        
               } 
        
           }

and explicitly checks for MPI_WIN_NULL.

Just to be 100% sure, I tested this with a trivial program:

#include <stdio.h>
#include <mpi.h>

int main()
{
    int len;
    char name[MPI_MAX_OBJECT_NAME];

    MPI_Init(NULL, NULL);
#if 0
    MPI_Type_get_name(MPI_DATATYPE_NULL, name, &len);
#elif 0
    MPI_Comm_get_name(MPI_COMM_NULL, name, &len);
#else
    MPI_Win_get_name(MPI_WIN_NULL, name, &len);
#endif
    printf("Got name: %s\n", name);
    MPI_Finalize();
    return 0;
}

I ran it with all 3 possibilities and Open MPI -- and MPICH 4.0 -- generate an error for all 3 MPI_*_GET_NAME with MPI_*_NULL cases.

Given that there is some ambiguity here, I think we should have the same behavior for all of the MPI_*_GET_NAME functions: they should all either accept the MPI_*_NULL handles or they should all disallow the MPI_*_NULL handles. Currently, they do the latter. If we want to change to the former, that's probably fine, but a) we should be consistent, and b) we should probably also document this behavior.

@bosilca if you fix up all the MPI_*_GET_NAME functions, I'll fix up the docs.

This applies to MPI_DATATYPE_NULL, MPI_COMM_NULL and MPI_WIN_NULL. MPI 4.0 states that "The predefined constant MPI_COMM_NULL is the value used for invalid communicator handles." Hence, we might assume that MPI_COMM_NULL is an invalid communicator, but that is only partially true because in 4.12.4 it is clearly stated that the translation function, MPI_*_C2F and MPI_*_F2C, should treat MPI_COMM_NULL as a valiid communicator. So, the real role of these MPI_*_NULL objects might be to point to valid MPI objects used as bounds, but they represent valid MPI objects that cannot be used for communications. In any case, as predefined handles they have an associated name, and the user _must_ be able to get this name. Signed-off-by: George Bosilca <[email protected]>

bosilca · 2022-02-23T15:53:48Z

Our implementation of ompi_*_invalid is, well, invalid. There is a lengthy comment in communicator.h on why we're doing the wrong thing (according to MPI 2.0), but we're fine with it because we have implemented shortcuts in the few places where it was needed (basically f2c and c2f functions). Anyway, if we assume that invalid means, improper for communications, then we should be good for now.

jsquyres

@bosilca and I chatted on the phone about this this morning. Here's the conclusions we came to:

We both agree that the MPI_*_GET_NAME functions should accept the MPI_*_NULL handles. It's just better for users.
From that perspective, we should accept this PR.
From the portability perspective, we should encourage MPICH to adopt this behavior as well. Because it's better for users.
For simplicity (assuming this PR is accepted), let's bring it to v5.0. But let's not backport it (i.e., do not bring it to v4.1.x or v4.0.x).
If the MPI Forum accepts mpi-forum/mpi-issues#544, 🎉 . If not, we can revert this PR.

jsquyres · 2022-02-23T16:27:24Z

ompi/win/win.c


    /* Setup MPI_WIN_NULL */
    OBJ_CONSTRUCT(&ompi_mpi_win_null.win, ompi_win_t);
-    ompi_mpi_win_null.win.w_flags = OMPI_WIN_INVALID;


Is this safe to do? Are there other places that rely on this flag being set upon construction?

The only use I could find is in https://github.com/open-mpi/ompi/blob/master/ompi/win/win.h#L155 where the window is explicitly checked for MPI_WIN_NULL as well.

All functions check the validity of the window using ompi_win_invalid, a functions that specifically checks for MPI_WIN_NULL (in addition to checking the OMPI_WIN_INVALID flag). No other usages of OMPI_WIN_INVALID exists in the code.

It is in fact the same story with all the _INVALID flags so it would be legitimate to wonder why do we need the _INVALID flag at all, but I don't think we want to have this discussion in the context of this PR.

If OMPI_WIN_INVALID is only used in 1) the constructor and 2) in ompi_win_invalid(), and you just removed it out of the constructor, then we should also remove it out of ompi_win_invalid(), too. Otherwise it's a dangling check for something that can never be true.

That's exactly what I said above (and it is true for all uses of _INVALID). But I don't think it should be part of this PR.

bwbarrett · 2022-02-23T17:20:27Z

I disagree with this PR. I think our current behavior is required by the standard and don't believe we should commit this without clarification from the MPI Forum. I think this is the desired behavior, but I also believe it violates the MPI standard (meaning we shouldn't do it without clarification on the standard).

MPI_COMM_GET_NAME (and friends) operate on a communicator. MPI_COMM_NULL is not a communicator; it's the NULL handle. The first sentence of MPI_COMM_GET_NAME is:

MPI_COMM_GET_NAME returns the last name which has previously been associated 22 with the given communicator.

Unless we all agree that you can call MPI_COMM_SET_NAME on MPI_COMM_NULL, and I would claim that the definition of MPI_COMM_SET_NAME:

MPI_COMM_SET_NAME allows a user to associate a name string with a communicator.

prohibits calling SET_NAME on MPI_COMM_NULL because, again, MPI_COMM_NULL is not a communicator, then we can't have GET_NAME handle the NULL handles and conform to the MPI specification.

The bit in MPI_COMM_GET_NAME about predefined communicators is also not ambiguous today. It says:

The three predefined communicators will have predefined names associated with them. Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and the communicator returned by MPI_COMM_GET_PARENT (if not MPI_COMM_NULL) will have the default of 35 "MPI_COMM_WORLD", "MPI_COMM_SELF", and "MPI_COMM_PARENT".

Note (1) the explicit callout of MPI_COMM_NULL as an exception and (2) that it says predefined communicators, not predefined communicator handles. Again, MPI_COMM_NULL is not a communicator, so this section would not apply.

The same argument applies to the invalid/valid flag. MPI_COMM_NULL is not a valid communicator, because it isn't a communicator. At one point, we used to mark objects (comm, win, etc.) invalid once they were freed, because the pointer to them was still "valid", from the sense that nothing prohibited a caller from using that pointer to a MPI_Send and it was likely still valid memory full of a semi-sane looking communicator that wouldn't actually work. George's comment makes me think that code got removed at some point. But I don't think removing the invalid is the right choice here, either.

gpaulsen · 2022-10-25T15:39:16Z

Rescinded my review based on further MPI standard evidence presented above.

jsquyres · 2022-10-25T19:22:13Z

Rescinded my review based on further MPI standard evidence presented above.

@gpaulsen Did you look at mpi-forum/mpi-issues#544?

dalcinl · 2025-09-28T08:08:25Z

@jsquyres @bosilca Could you please move forward this PR?

Note that #13276 fixed MPI_Comm_get_name.

MPI_Win_get_name should be fixed, too. Do you prefer to rebase the changes in this PR and do everything here?
Or would it be better to close this PR and start fresh with a new one?

bosilca · 2025-10-02T04:35:28Z

Once rebased this shows little difference with main. I checked with the example above and all NULL objects return the correct name with main, this PR is unnecessary.

jsquyres reviewed Feb 20, 2022

View reviewed changes

jsquyres mentioned this pull request Feb 20, 2022

libompitrace does not check use of MPI_DATATYPE_NULL when calling PMPI_Type_get_name and causes MPI_ERR_TYPE #10028

Closed

gpaulsen approved these changes Feb 21, 2022

View reviewed changes

jsquyres mentioned this pull request Feb 22, 2022

libompitrace: handle MPI_DATATYPE_NULL #10035

Merged

bosilca force-pushed the fix/type_null_name branch from 39c1030 to 124ca9c Compare February 23, 2022 15:49

jsquyres reviewed Feb 23, 2022

View reviewed changes

awlauria added the Target: main label Apr 27, 2022

awlauria added the ⚠️ WIP-DNM! label Oct 12, 2022

gpaulsen self-requested a review October 25, 2022 15:38

bosilca closed this Oct 2, 2025

Uh oh!

Allow name query for MPI_DATATYPE_NULL. #10029

Allow name query for MPI_DATATYPE_NULL. #10029

Conversation

bosilca commented Feb 19, 2022

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

bosilca commented Feb 20, 2022

Uh oh!

jsquyres commented Feb 20, 2022

Uh oh!

bosilca commented Feb 20, 2022

Uh oh!

bosilca commented Feb 22, 2022

Uh oh!

jsquyres commented Feb 22, 2022

Uh oh!

bosilca commented Feb 23, 2022

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

jsquyres Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devreal Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

bosilca Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

jsquyres Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

bosilca Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

bwbarrett commented Feb 23, 2022

Uh oh!

gpaulsen commented Oct 25, 2022

Uh oh!

jsquyres commented Oct 25, 2022

Uh oh!

dalcinl commented Sep 28, 2025

Uh oh!

bosilca commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jsquyres Feb 23, 2022 •

edited

Loading